智能论文笔记

Dense Representative Tooth Landmark/axis Detection Network on 3D Model

Guangshun Wei , Zhiming Cui , Jie Zhu , Lei Yang , Yuanfeng Zhou , Pradeep Singh , Min Gu , Wenping Wang

分类：人工智能 | 计算机视觉

2021-11-08

人工智能（AI）技术越来越多地用于数字正畸性，但其中一个挑战是自动准确地检测牙齿标志和轴。这部分是因为它们的复杂几何定义，部分原因是各个齿之间的大变化以及跨越不同类型的牙齿。因此，我们提出了一种深入的学习方法，通过专业牙医与标签数据集进行标记的数据集，以对牙齿模型的牙齿地标/轴检测，这对正畸治疗至关重要。我们的方法可以不仅提取点（例如CUSP）的形式提取牙齿地标，而且还可以提取牙齿地标，而且还可以测量牙齿角度和倾斜的轴。所提出的网络作为输入3D齿模型，并预测各种类型的牙齿地标和轴。具体地，我们将地标和轴编码为在齿模型表面上定义的致密字段。这种设计选择和一组添加的组件使得所提出的网络更适合于从给定的3D齿模型提取稀疏地标。对所提出的方法进行广泛评估，在经验丰富的牙医制备的一套牙科模型上进行。结果表明，我们的方法可以高精度地生产牙齿地标。我们通过与最先进的方法以及烧蚀研究进行了研究和证明我们的方法。

translated by 谷歌翻译

AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

Yuanfeng Ji , Haotian Bai , Jie Yang , Chongjian Ge , Ye Zhu , Ruimao Zhang , Zhen Li , Lingyan Zhang , Wanling Ma , Xiang Wan

分类：计算机视觉 | 机器学习

2022-06-16

尽管近年来从CT/MRI扫描中自动腹部多器官分割取得了很大进展，但由于缺乏各种临床方案的大规模基准，对模型的能力的全面评估受到阻碍。收集和标记3D医学数据的高成本的限制，迄今为止的大多数深度学习模型都由具有有限数量的感兴趣或样品器官的数据集驱动，这仍然限制了现代深层模型的力量提供各种方法的全面且公平的估计。为了减轻局限性，我们提出了AMO，这是一个大规模，多样的临床数据集，用于腹部器官分割。 AMOS提供了从多中心，多供应商，多模式，多相，多疾病患者收集的500 CT和100次MRI扫描，每个患者均具有15个腹部器官的体素级注释，提供了具有挑战性的例子，并提供了挑战性的例子和测试结果。在不同的目标和场景下研究健壮的分割算法。我们进一步基准了几种最先进的医疗细分模型，以评估此新挑战性数据集中现有方法的状态。我们已公开提供数据集，基准服务器和基线，并希望激发未来的研究。信息可以在https://amos22.grand-challenge.org上找到。

translated by 谷歌翻译

Watch Me Calibrate My Force-Sensing Shoes!

Yuanfeng Han , Boren Jiang , Gregory S. Chirikjian

分类：机器人

2022-02-26

本文提出了一种新的方法，用于较小的类人机器人自我校准其脚力传感器。该方法由两个步骤组成：1。命令机器人以不同的双支持配置沿计划的全身轨迹移动。2.通过优化在机器人运动过程中，通过最大程度地减少测量和建模压力中心（COP）和地面反作用力（GRF）之间的误差来确定传感器参数。这是针对较小的人形机器人中的脚力传感器设备的第一个提议的自主校准方法。此外，我们引入了一种高准确的手动校准方法来建立COP地面真理，该方法用于使用自校准来验证测得的COP。结果表明，自校准可以准确估计COP和GRF，而无需任何手动干预。使用NAO类人动物平台和先前呈现的力感应鞋来证明我们的方法。

translated by 谷歌翻译

Speech-to-SQL: Towards Speech-driven SQL Query Generation From Natural Language Question

Yuanfeng Song , Raymond Chi-Wing Wong , Xuefang Zhao , Di Jiang

分类：人工智能 | 自然语言处理

2022-01-04

基于语音的投入在我们日常生活中获得了智能手机和平板电脑的普及，因为声音是人类计算机交互的最简单而有效的方式。本文旨在设计更有效的基于语音的接口，以查询关系数据库中的结构化数据。我们首先识别名为Speep-to-SQL的新任务，旨在了解人类语音传达的信息，并直接将其转换为结构化查询语言（SQL）语句。对此问题的天真解决方案可以以级联方式工作，即，自动语音识别（ASR）组件，后跟文本到SQL组件。然而，它需要高质量的ASR系统，并且还遭受了两种组件之间的错误复合问题，从而产生有限的性能。为了处理这些挑战，我们进一步提出了一个名为SpeepSQLNET的新型端到端神经结构，直接将人类语音转化为没有外部ASR步骤的SQL查询。 SpeemSQLNET具有充分利用演讲中提供的丰富语言信息的优势。据我们所知，这是第一次尝试根据任意自然语言问题直接综合SQL，而不是基于自然语言的SQL版本或其具有有限的SQL语法的变体。为了验证所提出的问题和模型的有效性，我们还通过捎带广泛使用的文本到SQL数据集来进一步构建名为SpeemQL的数据集。对该数据集的广泛实验评估表明，SpeemSQLNET可以直接从人类语音中直接综合高质量的SQL查询，优于各种竞争对手，以及在精确匹配的准确性方面的级联方法。

translated by 谷歌翻译

Cluster-guided Contrastive Graph Clustering Network

Xihong Yang , Yue Liu , Sihang Zhou , Siwei Wang , Wenxuan Tu , Qun Zheng , Xinwang Liu , Liming Fang , En Zhu

分类：机器学习

2023-01-03

Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.

translated by 谷歌翻译

Explaining Imitation Learning through Frames

Boyuan Zheng , Jianlong Zhou , Chunjie Liu , Yiqiao Li , Fang Chen

分类：机器学习 | 计算机视觉

2023-01-03

As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.

translated by 谷歌翻译

ClusTop: An unsupervised and integrated text clustering and topic extraction framework

Zhongtao Chen , Chenghu Mi , Siwei Duo , Jingfei He , Yatong Zhou

分类：自然语言处理

2023-01-03

Text clustering and topic extraction are two important tasks in text mining. Usually, these two tasks are performed separately. For topic extraction to facilitate clustering, we can first project texts into a topic space and then perform a clustering algorithm to obtain clusters. To promote topic extraction by clustering, we can first obtain clusters with a clustering algorithm and then extract cluster-specific topics. However, this naive strategy ignores the fact that text clustering and topic extraction are strongly correlated and follow a chicken-and-egg relationship. Performing them separately fails to make them mutually benefit each other to achieve the best overall performance. In this paper, we propose an unsupervised text clustering and topic extraction framework (ClusTop) which integrates text clustering and topic extraction into a unified framework and can achieve high-quality clustering result and extract topics from each cluster simultaneously. Our framework includes four components: enhanced language model training, dimensionality reduction, clustering and topic extraction, where the enhanced language model can be viewed as a bridge between clustering and topic extraction. On one hand, it provides text embeddings with a strong cluster structure which facilitates effective text clustering; on the other hand, it pays high attention on the topic related words for topic extraction because of its self-attention architecture. Moreover, the training of enhanced language model is unsupervised. Experiments on two datasets demonstrate the effectiveness of our framework and provide benchmarks for different model combinations in this framework.

translated by 谷歌翻译

CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

Jie Liu , Yixiao Zhang , Jie-Neng Chen , Junfei Xiao , Yongyi Lu , Bennett A. Landman , Yixuan Yuan , Alan Yuille , Yucheng Tang , Zongwei Zhou

分类：计算机视觉 | 机器学习

2023-01-02

An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.

translated by 谷歌翻译

PCRLv2: A Unified Visual Information Preservation Framework for Self-supervised Pre-training in Medical Image Analysis

Hong-Yu Zhou , Chixiang Lu , Chaoqi Chen , Sibei Yang , Yizhou Yu

分类：计算机视觉 | 机器学习

2023-01-02

Recent advances in self-supervised learning (SSL) in computer vision are primarily comparative, whose goal is to preserve invariant and discriminative semantics in latent representations by comparing siamese image views. However, the preserved high-level semantics do not contain enough local information, which is vital in medical image analysis (e.g., image-based diagnosis and tumor segmentation). To mitigate the locality problem of comparative SSL, we propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics. We also address the preservation of scale information, a powerful tool in aiding image understanding but has not drawn much attention in SSL. The resulting framework can be formulated as a multi-task optimization problem on the feature pyramid. Specifically, we conduct multi-scale pixel restoration and siamese feature comparison in the pyramid. In addition, we propose non-skip U-Net to build the feature pyramid and develop sub-crop to replace multi-crop in 3D medical imaging. The proposed unified SSL framework (PCRLv2) surpasses its self-supervised counterparts on various tasks, including brain tumor segmentation (BraTS 2018), chest pathology identification (ChestX-ray, CheXpert), pulmonary nodule detection (LUNA), and abdominal organ segmentation (LiTS), sometimes outperforming them by large margins with limited annotations.

translated by 谷歌翻译

Credible Remote Sensing Scene Classification Using Evidential Fusion on Aerial-Ground Dual-view Images

Kun Zhao , Qian Gao , Siyuan Hao , Jie Sun , Lijian Zhou

分类：计算机视觉 | 人工智能

2023-01-02

Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.

translated by 谷歌翻译